A Deduplication Study for Host-side Caches with Dynamic Workloads in Virtualized Data Center Environments
نویسندگان
چکیده
Deduplication is a well-known method that improves storage efficiency and reduces the cost of storage in corporate data centers [3, 4]. For virtualized data centers, and in particular for virtual desktop infrastructure (VDI), centrally-managed networked storage can greatly reduce the overall data footprint because virtual machine (VM) disk images have largely the same content. Recent work by Byan et al. [1] suggested employing flash memory-based host-side caches inside VM hypervisors to shed load from the shared storage infrastructure. They demonstrated that such caches can be effective for read-mostly workloads with stable working sets. However, it is not known if they can be as effective in a dynamic environment, when virtual machines migrate frequently from one physical server (hypervisor) to another or when their working sets change. First, since these caches are large (many 100s of GBs), re-warming them with the active content to reach a steady state cache hit rate after a VM migrates to another hypervisor can take as much as 12 hours [1]. Second, as each VM disk image is a separate entity, the caches might contain many copies of the same content even though the network-attached shared storage system would store only a single instance, unnecessarily polluting the host-side hypervisor caches and reducing their overall cache hit rate. The goal of our study is to explore the effectiveness of deduplication for large host-side caches in virtualized data center environments running dynamic VDI workloads. To that end, we analyze traces captured from VDI deployments as well as general enterprise workloads [2]. Understanding the intrinsic properties of data duplication can help us design effective host-side cache policies. Previous deduplication systems and studies focused on data mostly at rest such as backups [4] and archives or on reducing network traffic across a WAN link. Our study on eight traces in duration from two minutes to almost 100 days shows that deduplication can reduce the data footprint inside host-side caches by as much as 67%. This allows for caching a larger portion of the data set and improves the effective cache hit rate. More importantly, such increased caching efficiency can alleviate load from networked storage systems during I/O intensive workloads when most VM instances perform the same operation such as virus scans, OS patch installs (a.k.a. update storms), and reboots (a.k.a. boot storms).
منابع مشابه
A Novel Framework Lime Lighting Dedup Over Caches in an Tacit Dope Center
Flash memory-based caches inside VM hypervisors can reduce I/O latencies and offload much of the I/O traffic from network-attached storage systems deployed in virtualized data centers. This paper e x p l o r e s the effectiveness of content deduplication in these large (typically 100s of GB) host-side caches. Previous deduplication studies focused on data mostly at rest in backup and archive ap...
متن کاملAdaptive Data Reduction Scheme for SSD-based Host-side Caches in VDI Storage
Because of the outstanding performance of SSD, it has been deployed in storage system large-scaly. However, the limited lifespan of SSDs has always been the critical shortcoming holding back the footsteps of SSD popularization. We will propose an adaptive data reduction strategy for SSD-based caches in virtualized environment, which can expand the lifespan and capacity of SSD. The adaptive data...
متن کاملSmall Is Big: Functionally Partitioned File Caching in Virtualized Environments
File cache management is among the most important factors affecting the performance of a cloud computing system. To achieve higher economies of scale, virtual machines are often overcommitted, which creates high memory pressure. Thus it is essential to eliminate duplicate data in the host and guest caches to boost performance. Existing cache deduplication solutions are based on complex algorith...
متن کاملSLA_Driven Adaptive Resource Allocation for Virtualized Servers
In order to reduce cost and improve efficiency, many data centers adopt virtualization solutions. The advent of virtualization allows multiple virtual machines hosted on a single physical server. However, this poses new challenges for resource management. Web workloads which are dominant in data centers are known to vary dynamically with time. In order to meet application’s service level agreem...
متن کاملHost Side Caching: Solutions and Opportunities
Host side caches use a form of storage faster than disk and less expensive than DRAM to deliver the speed demanded by data intensive applications. Today, this form of storage is NAND Flash, complementing a disk-based solution. A host side cache may integrate into an existing application seamlessly. This may be realized by using an infrastructure component (such as a storage stack middleware or ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013